What is "Alpha" in an Image?

Alpha (abbreviated A, as in RGBA) is what determines if a pixel is transparent or opaque in an image, in image formats that support semi-transparency, such as PNG and WebP, and image editors that support the concept through a layers system. Typically, alpha is a value from 0 to 255, where 0 means transparent, and 255 means opaque. Every pixel has its own alpha value. In RGB format, every pixel also has a value for red, green, and blue. When we take all the values in an image for only one of these components, we call it a color channel (e.g. red channel). For alpha, it's sometimes called "alpha channel" or "alpha color channel" (I'm not sure if alpha counts as a color).

The math done on alpha uses normalized fractional values (from 0% to 100%, or 0.0 to 1.0). In 32 bpp images with 4 channels (RGBA), each channel would have 8 bits per pixel, which is why the values go from 0 to 255. Here, 255 is 100%. If we used 16bpp channels, the values would go from 0 to 65535, and 65535 would be 100%. In any case, we can have (almost) any value from 0% to 100%—there are rounding errors. For example, we can't have a pixel that is perfectly 50% transparent, because 255 divided by 2 is 127.5, which isn't an integer, so we can't set it as the alpha for the pixel—it will have to be 127 (49.8%) or 128 (50.1%).

Alpha Blending

The math used for blending transparent images is very simple. Assume we have images X and Y. Image X is opaque. Image Y has transparent pixels. In order to apply Y on top of X, we would perform the following operation for each pixel:

X_share = 100% - Y_alpha
Y_share = Y_alpha
new_X_rgb = X_rgb × X_share + Y_rgb × Y_share

When Y_alpha is 0%, new_X_rgb's value is composed 100% of X_rgb and 0% of Y_rgb. As Y_alpha increases, the amount the final value is composed of X decreases, while of Y increases. Observer that 100% - Y_alpha + Y_alpha is always 100%, because -Y_alpha and +Y_alpha cancel each other. Thus, the variables X_share + Y_share = 100% every time as well.

All we want to do is add Y_rgb to X_rgb in proportion to its opacity (Y_alpha). The reason we need to do this complicated math is because if we just added it, the RGB values can only go up, never down. This means if we used a black image for Y, for example, X_rgb + 0, 0, 0 = Y_rgb. The final image wouldn't change. This is actually called the "addition" blend mode in image editors that support layer blend modes. In order to make black make the composition darker, we need to reduce the amount X_rgb contributes to the composition in proportion opposite to the amount of alpha Y has.

The operation above is often simplified as:

new_X_rgb = X_rgb × (1.0 - Y_alpha) + Y_rgb × Y_alpha

A diagram of the typical alpha blending function.

Pre-Multiplied Alpha

Although alpha is generally used for transparency, that isn't always the case. Some software that work with images can use the alpha channel for other purposes. This will immediately run into some issues, due to most image editors working with pre-multiplied alpha when compositing layers, which turns fully transparent pixels into black pixels.

.Let's take a look again at the formula above.

new_X_rgb = X_rgb × (1.0 - Y_alpha) + Y_rgb × Y_alpha

We can see above that Y_rgb has to be multiplied by its alpha. This operation has to be done every single pixel. If we need to do alpha blending lots of times, as is the case with video-games, for example, we could speed this up if we created an image with the RGB values already multiplied by their alpha. That way we only need to do this multiplication once instead of every time. Then we would need to update our formula to:

new_X_rgb = X_rgb × (1.0 - Y_alpha) + premul_Y_rgb

The reason why transparent pixels become black is that if Y_alpha is 0, then Y_rgb × Y_alpha = 0, 0, 0.The pre-multiplied color of a transparent pixel is always going to be black.

Note that we still need to keep the alpha values of the image in order to reduce X's contribution to the final composition. This is because we're doing "normal" blending. If we were doing "addition" blending, we could get rid of the alpha channel entirely in this case if we had already pre-multiplied it. Lots of games use this blending to create "fire" particles in graphical effects, for example. Then our formula would look like this:

new_X_rgb = X_rgb + premul_Y_rgb

Each of these formulas is different from the other. Naturally, this means that a formula that works with images that have pre-multiplied alpha won't work with images that do NOT have pre-multiplied alpha, and vice-versa. The developer has to know which kind of images they're working with.

Linear Color Blending and Gamma Correction

In most cases, color blending as performed with the functions above results in poor results due to it being applied to gamma corrected RGB values.

In order for us humans to perceive color, our eyes must react to stimuli, more specifically, to certain amounts of photons of light that reach our eyes. However, the amount of brightness that we perceive doesn't scale linearly with the amount of photons we see. Our eyes are sensitive to small amounts of light, so there's a great difference between no light and a little light, compared to increasing amounts of light. On top of that, the amount of electric power a monitor requires to emit photons won't necessarily scale linearly either. As if that wasn't enough, light isn't even divided into "RGB" lights. There is only one light, and the colors are merely different wavelengths of light.

RGB color is just lies upon lies upon lies.

In general, the RGB values we find in images are gamma corrected. This comes from the era of CRT monitors. The amount of energy to emit different amounts of light didn't scale linearly in CRT, so in order to make images look correctly in the monitors, they were stored with the RGB values already "corrected" for display.

In the old days of digital imaging most monitors were cathode-ray tube (CRT) monitors. These monitors had the physical property that twice the input voltage did not result in twice the amount of brightness. Doubling the input voltage resulted in a brightness equal to an exponential relationship of roughly 2.2 known as the gamma of a monitor. This happens to (coincidently) also closely match how human beings measure brightness as brightness is also displayed with a similar (inverse) power relationship.
https://learnopengl.com/Advanced-Lighting/Gamma-Correction (accessed 2024-07-09)

For example, if the CRT gamma is 2.2, and you feed it 50% its maximum input voltage, the math would be 0.5^2.2=0.21. In other words, the CRT monitor outputs only 21% brightness with 50% input.

I'm not sure where it comes from exactly, but the term "18% gray" in photography shares the same idea: when we correct for (some amount of) gamma, the "mid-gray" becomes "18% gray."

Note that 0.0^2.2=0.0 and 1.0^2.2=1.0, so 0% (black) and 100% (white) don't change with this function, only the values between change.

How do we make the monitor emit 50% gray, then? We need to correct the gamma. This is done by inverting the exponent, like this:

corrected_color = color^{(1 ÷ gamma)}
color = corrected_color^(gamma)

In our case: 0.5^1÷2.2=0.72. If we feed the monitor 72% of its maximum input voltage, it should output 50% brightness.

In essence, what this means is that when we work with RGB values, the RGB values don't map linearly to how light works, and this creates distortions when we blend to colors using alpha.

For example, if we had a black color and we applied a white color on top of it with 50% alpha, what should the result be? What we are doing is the linear interpolation of values: 0, 0, 0 and 255, 255, 255, so we'll get 127, 127, 127, rounded down to the nearest integer. By itself, there is no problem with this math.

When we feed this to the monitor, the monitor will output 21% brightness, not 50% brightness.

By coincidence, our eyes are more sensitive to low amounts of light with similar gamma, so this 21% brightness will look like mid-gray to us, while the 50% brightness would look like light gray.

By the sound of it, it sounds like the two complicated math problems corrected each other, so I guess it's fine?

In this simple example, I guess you could say that.

But if you're working with multiple semi-transparent effects that apply on top of each other, such as 3D lighting, or even if you are a 2D illustrator who paints with semi-transparent brushes, these distortions in the color space will add up. Colors will look subtly weird, specially when blending bright colors, or creating gradients of shading from dark colors to bright colors, and you won't even understand why.

A side effect of working with linear RGB values is that most applications use gamma corrected values, so they end up becoming incompatible with each other, producing different results. For example, if you have a semi-transparent drawing in Inkscape on a white background, and you export it with a transparent background as PNG, then import the transparent PNG into GIMP, and add a white background in GIMP, you would expect the drawing to look exactly the same in both Inkscape and GIMP. However, since GIMP version 2.10, GIMP uses linear blending by default. Meanwhile, Inkscape operates in gamma corrected space. Consequently, the semi-transparent pixels will look brighter in GIMP than they look in Inkscape. A similar problem occurs if you try to export a transparent PNG from GIMP to display it in a web browser or practically any software: because they will work on gamma corrected colors, the semi-transparent pixels will always look darker than they do in GIMP.

In Krita, the default RGB color space is gamma corrected. This is also called the "sRGB" profile, or "perceptual RGB." It's possible to specify a linear RGB profile when creating a document. In Krita, some tools, like the gradient tool, will still use linear RGB even in non-linear color spaces. I'm not really sure why.

Luminance Blending

We perceive pure red, pure green, and pure blue as having different levels of brightness: blue is darker than red, and green is brighter than red. Consequently, it's difficult to tell what's the actual brightness of something by looking at its RGB values.

The solution for this is to use a non-RGB color model , such as L*a*b*, YUV or YCbCr. In these color models, the first component stands for "lightness" or "luma," while the other two are for chrominance (chroma). Blending values in these color models will always make more sense than blending them in RGB. Unfortunately, we would still need to convert them to RGB to display them on the screen, so it may not make a lot of sense for real time graphics, as there are performance costs involved.

When performing alpha blending in these color models, we will often see colors "shift" in hue in ways that wouldn't make sense in RGB. For example, if we blend white with red, we'll see a shift toward orange. When blending white with green, we'll see a shift toward yellow. More specifically, what happens in this case is that, because blue is darker, it takes longer to get mixed in when increasing lightness. If we blended toward black, we would see the opposite effect.

This is similar to "warm colors" and "cold colors" in color theory.

I'm not sure, but I assume it's simply not physically possible to reach certain luminance levels with certain hues. Let's see an example to understand this better.

Yellow has the highest luminance of all basic colors. If you have yellow, you have high luminance. In RGB, yellow is R + G, so it makes sense that it would be brighter than R or G alone. However, yellow has higher luminance than cyan (G + B), which is higher than purple (B + R). What would happen if we blended yellow with purple? Since yellow has 100% red and purple has 100% red, the red component never changes. The green component goes down, and, at the exact same rate, the blue component would go up. Because these components have different luminance, we end up with a non-linear blending of luminance in the middle of the gradient. In practice what this means is that the middle of the gradient will look lighter than expected in linear RGB or more saturated than expected in sRGB. On the other hand, if we did this in LAB, the luminance would be interpolated linearly, which looks better. To do this, the RGB components are blended non-linearly so that the final luminance is correct.

A comparison of sRGB (perceptual RGB, i.e. in gamma corrected color space), CIELAB color model, and Linear RGB color blending created with a yellow-purple gradient.

Banding & Rounding Errors

When performing a large number of blending operations, rounding errors can accumulate and result in a "banding" effect. The main reason for this is that most applications work with a color depth of 8 bit per pixel per color channel, i.e. the values range from 0 to 255.

Normally, this wouldn't be an issue, however, if you keep converting from one color model or color space to another, rounding errors accumulate because, simply put, there's too few numbers, so values in one space don't map perfectly with another space.

A simple example iHSL. The "hue" is HSL goes from 0 to 360 degrees, whereas lightness goes from 0 to 100. If 0 lightness is 0, 0, 0 RGB, and 100 lightness is 255, 255, 255 RGB, that means, interpolated linearly, that 1 lightness is "2.55" for each RGB component, which we would round up to 3. In other words, the RGB tuples 1, 1, 1 and 2, 2, 2 would be between 0 and 1 lightness in HSL, and only 3, 3, 3 could be represented with HSL in this manner.

The same idea applies when we convert sRGB to linear RGB, or to LAB, some values will be "skipped."

For screen display, 8 bits per channel is good enough in most cases, but the same can't be said about image editing.

One strategy is to work with a higher depth of color, such as 16 bit per channel, and then export to 8 bit per channel. The problem of rounding errors still exists, but because we're working with 16 bit, our values will range from 0 to 65565. If one to two numbers are "skipped," it won't matter because we'll have to scale it down to 0 to 255 later: it would require an error of 256 in 16 bit for it to affect the 8 bit value by 1.

Alpha